智能论文笔记

PP-YOLOE: An evolved version of YOLO

Shangliang Xu , Xinxin Wang , Wenyu Lv , Qinyao Chang , Cheng Cui , Kaipeng Deng , Guanzhong Wang , Qingqing Dang , Shengyu Wei , Yuning Du

分类：计算机视觉

2022-03-30

In this report, we present PP-YOLOE, an industrial state-of-the-art object detector with high performance and friendly deployment. We optimize on the basis of the previous PP-YOLOv2, using anchor-free paradigm, more powerful backbone and neck equipped with CSPRepResStage, ET-head and dynamic label assignment algorithm TAL. We provide s/m/l/x models for different practice scenarios. As a result, PP-YOLOE-l achieves 51.4 mAP on COCO test-dev and 78.1 FPS on Tesla V100, yielding a remarkable improvement of (+1.9 AP, +13.35% speed up) and (+1.3 AP, +24.96% speed up), compared to the previous state-of-the-art industrial models PP-YOLOv2 and YOLOX respectively. Further, PP-YOLOE inference speed achieves 149.2 FPS with TensorRT and FP16-precision. We also conduct extensive experiments to verify the effectiveness of our designs. Source code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

translated by 谷歌翻译

CCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation detection and diagnosis

Nianzu Zheng , Liqun Deng , Wenyong Huang , Yu Ting Yeung , Baohua Xu , Yuanyuan Guo , Yasheng Wang , Xin Jiang , Qun Liu

分类：自然语言处理

2021-11-16

端到端模型正在成为误用检测和诊断（MDD）的流行方法。许多实际应用要求的流MDD框架仍然是一个挑战。本文提出了一种名为CCA-MDD的流端到端MDD框架。CCA-MDD支持在线处理，并且能够实时运行。CCA-MDD的编码器包括基于Conv变压器网络的流式声学编码器，并改善了命名的耦合横向（CCA）的改进的横向关注。耦合的横向于预先编码的语言特征集成了编码的声学特征。应用从多任务学习培训的解码器的集合用于最终MDD决策。公开的Corpora实验表明，CCA-MDD可实现可比性的性能，以发布离线端到端MDD模型。

translated by 谷歌翻译

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

Guanghua Yu , Qinyao Chang , Wenyu Lv , Chang Xu , Cheng Cui , Wei Ji , Qingqing Dang , Kaipeng Deng , Guanzhong Wang , Yuning Du

分类：计算机视觉

2021-11-01

更好的准确性和效率权衡在对象检测中是一个具有挑战性的问题。在这项工作中，我们致力于研究对象检测的关键优化和神经网络架构选择，以提高准确性和效率。我们调查了无锚策略对轻质对象检测模型的适用性。我们增强了骨干结构并设计了颈部的轻质结构，从而提高了网络的特征提取能力。我们改善标签分配策略和损失功能，使培训更稳定和高效。通过这些优化，我们创建了一个名为PP-Picodet的新的实时对象探测器系列，这在移动设备的对象检测上实现了卓越的性能。与其他流行型号相比，我们的模型在准确性和延迟之间实现了更好的权衡。 Picodet-s只有0.99m的参数达到30.6％的地图，它是地图的绝对4.8％，同时与yolox-nano相比将移动CPU推理延迟减少55％，并且与Nanodet相比，MAP的绝对改善了7.1％。当输入大小为320时，它在移动臂CPU上达到123个FPS（使用桨Lite）。Picodet-L只有3.3M参数，达到40.9％的地图，这是地图的绝对3.7％，比yolov5s更快44％。如图1所示，我们的模型远远优于轻量级对象检测的最先进的结果。代码和预先训练的型号可在https://github.com/paddlepaddle/paddledentions提供。

translated by 谷歌翻译

A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks

Jie Zou , Jiashu Lou , Baohua Wang , Sixue Liu

分类：人工智能

2022-12-06

More and more stock trading strategies are constructed using deep reinforcement learning (DRL) algorithms, but DRL methods originally widely used in the gaming community are not directly adaptable to financial data with low signal-to-noise ratios and unevenness, and thus suffer from performance shortcomings. In this paper, to capture the hidden information, we propose a DRL based stock trading system using cascaded LSTM, which first uses LSTM to extract the time-series features from stock daily data, and then the features extracted are fed to the agent for training, while the strategy functions in reinforcement learning also use another LSTM for training. Experiments in DJI in the US market and SSE50 in the Chinese stock market show that our model outperforms previous baseline models in terms of cumulative returns and Sharp ratio, and this advantage is more significant in the Chinese stock market, a merging market. It indicates that our proposed method is a promising way to build a automated stock trading system.

translated by 谷歌翻译

Palm Vein Recognition via Multi-task Loss Function and Attention Layer

Jiashu Lou , Jie zou , Baohua Wang

分类：计算机视觉 | 机器学习

2022-11-11

With the improvement of arithmetic power and algorithm accuracy of personal devices, biological features are increasingly widely used in personal identification, and palm vein recognition has rich extractable features and has been widely studied in recent years. However, traditional recognition methods are poorly robust and susceptible to environmental influences such as reflections and noise. In this paper, a convolutional neural network based on VGG-16 transfer learning fused attention mechanism is used as the feature extraction network on the infrared palm vein dataset. The palm vein classification task is first trained using palmprint classification methods, followed by matching using a similarity function, in which we propose the multi-task loss function to improve the accuracy of the matching task. In order to verify the robustness of the model, some experiments were carried out on datasets from different sources. Then, we used K-means clustering to determine the adaptive matching threshold and finally achieved an accuracy rate of 98.89% on prediction set. At the same time, the matching is with high efficiency which takes an average of 0.13 seconds per palm vein pair, and that means our method can be adopted in practice.

translated by 谷歌翻译

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Chenxia Li , Weiwei Liu , Ruoyu Guo , Xiaoting Yin , Kaitao Jiang , Yongkun Du , Yuning Du , Lingfeng Zhu , Baohua Lai , Xiaoguang Hu

分类：计算机视觉

2022-06-07

如图1所示，光学特征识别（OCR）技术已在各种场景中广泛使用。设计实用的OCR系统仍然是一项有意义但具有挑战性的任务。在以前的工作中，考虑到效率和准确性，我们提出了实用的超轻型OCR系统（PP-OCR）和优化的版本PP-OCRV2。为了进一步提高PP-OCRV2的性能，本文提出了更强大的OCR系统PP-OCRV3。 PP-OCRV3基于PP-OCRV2的9个方面升级了文本检测模型和文本识别模型。对于文本检测器，我们引入了一个带有大型接收场LK-PAN的锅模块，该模块是一个名为RSE-FPN的剩余注意机制的FPN模块和DML蒸馏策略。对于文本识别器，基本模型将从CRNN替换为SVTR，我们介绍了轻量级文本识别网络SVTR LCNET，通过注意力进行CTC的指导培训，数据增强策略TextConaug，由自我审查的TextRotnet，UDML和UDML和UDML和UDML和更好的预培训模型。 UIM加速模型并改善效果。实际数据上的实验表明，在可比的推理速度下，PP-OCRV3的Hmean比PP-OCRV2高5％。上述所有上述型号都是开源的，并且代码可在由PaddlePaddle供电的GitHub存储库Paddleocr中可用。

translated by 谷歌翻译

PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset

Lutao Chu , Yi Liu , Zewu Wu , Shiyu Tang , Guowei Chen , Yuying Hao , Juncai Peng , Zhiliang Yu , Zeyu Chen , Baohua Lai

分类：计算机视觉 | 机器学习

2021-12-14

作为世界各地的Covid-19大流行横冲直撞，对视频会议激增的需求。为此，实时肖像分割成为一种流行的功能，以取代会议参与者的背景。虽然为从生命场景中提取身体姿势的分段提供了具有丰富的数据集，模型和算法，但纵向分割尚未在视频会议上下文中覆盖很好。为了促进该领域的进步，我们介绍了名为PP-Humanseg的开源解决方案。这项工作是第一个构建一个大型视频纵向数据集，其中包含291个会议场景中的291个视频，其中14K细微的帧和扩展到多摄像头电话。此外，我们提出了一种用于语义分割的新型语义连接感知学习（SCL），其引入了语义连接感知丢失，以提高来自连接的角度的分段结果。我们提出了一种超轻量级模型，具有SCL的实际肖像分割，实现IOO之间的最佳权衡和推理的速度。我们数据集的广泛评估展示了SCL和我们的模型的优越性。源代码可在https://github.com/paddlepaddle/paddleseg上获得。

translated by 谷歌翻译

PP-MSVSR: Multi-Stage Video Super-Resolution

Lielin Jiang , Na Wang , Qingqing Dang , Rui Liu , Baohua Lai

分类：计算机视觉

2021-12-06

不同于单图像超分辨率（SISR）任务，视频超分辨率（VSR）任务的键是在帧中充分利用互补信息来重建高分辨率序列。由于来自不同帧的图像具有不同的运动和场景，因此精确地对准多个帧并有效地融合不同的帧，这始终是VSR任务的关键研究工作。为了利用邻近框架的丰富互补信息，在本文中，我们提出了一种多级VSR深度架构，称为PP-MSVSR，局部融合模块，辅助损耗和重新对准模块，以逐步改进增强率。具体地，为了加强特征传播中帧的特征的融合，在阶段-1中设计了局部融合模块，以在特征传播之前执行局部特征融合。此外，我们在阶段-2中引入辅助损耗，使得通过传播模块获得的特征储备更多相关的信息连接到HR空间，并在阶段-3中引入重新对准模块以充分利用该特征信息前一阶段。广泛的实验证实，PP-MSVSR实现了VID4数据集的有希望的性能，其实现了28.13dB的PSNR，仅具有1.45米的参数。并且PP-MSVSR-L具有相当大的参数的REDS4数据集上的所有状态。代码和模型将在Paddlegan \脚注{https://github.com/paddlepaddle/paddlegan。}。

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

AI in HCI Design and User Experience

Wei Xu

分类：人工智能

2023-01-03

In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.

translated by 谷歌翻译